Quality Impact of Value Matching and Scoring in Top-k Entity Attribute Extraction∗
نویسندگان
چکیده
The entity attribute extraction problem, or how to extract entities and their attribute values from natural language Web documents, is of critical importance for Web search and information access in general. Unfortunately, because of the noisy nature of theWeb and its scale, entity attribute extraction is notoriously challenging in terms of both extraction efficiency and quality. In our earlier work [24], we proposed a top-k extraction processing approach that addressed the efficiency challenge: Our approach leveraged a popularitybased scoring function to rank Web pages according to their entity-specific importance, and focused the extraction effort over the highly ranked pages for each entity of interest. The extraction quality resulting from this efficiency-motivated extraction approach, however, has not been studied and is the focus of this paper. Specifically, we make progress toward addressing the quality challenge through an in-depth analysis of two critical components of the extraction process, namely, matching and scoring of extracted attribute values. The design choices for these components can substantially impact the quality of the entity attribute extraction process, as we demonstrate with experiments with a state-of-the-art extraction system and entities from two domains of interest.
منابع مشابه
Bootstrapped Named Entity Recognition for Product Attribute Extraction
We present a named entity recognition (NER) system for extracting product attributes and values from listing titles. Information extraction from short listing titles present a unique challenge, with the lack of informative context and grammatical structure. In this work, we combine supervised NER with bootstrapping to expand the seed list, and output normalized results. Focusing on listings fro...
متن کاملPrivacy and Efficiency Tradeoffs for Multiword Top K Search with Linear Additive Rank Scoring
This paper proposes a private ranking scheme with linear additive scoring for efficient top K keyword search on modest-sized cloud datasets. This scheme strikes for tradeoffs between privacy and efficiency by proposing single-round client-server collaboration with server-side partial ranking based on blinded feature weights with random masks. Client-side preprocessing includes query decompositi...
متن کاملRanked Join Indices
A plethora of data sources contain data entities that could be ordered according to a variety of attributes associated with the entities. Such orderings result effectively in a ranking of the entities according to the values in the attribute domain. Commonly, users correlate such sources for query processing purposes through join operations. In query processing, it is desirable to incorporate u...
متن کاملThe Extraction of Influencing Indicators for Scoring of Insurance Companies Branches Based on GMDH Neural Network
O ne of the key topics and the most important tools to determine the strengths, weaknesses, opportunities and threats of each organization and company is the evaluation the performance of organizational activities that rating and ranking follows the internal and external goals. In this regard insurance companies similarly are looking for evaluation of their branches through scoring, ...
متن کاملThe impact of health transformation plan on health services fees: brief report
Background: Tariff setting in healthcare is an important control knob affecting the quality, access and cost of services. As part of Iran Health Transformation Plan (HTP) in 2014, the relative value of health care and services was increased to motivate healthcare providers to deliver high quality services. This study aimed to examine the impact of HTP on health services tariffs. Methods: This ...
متن کامل